Plastomes of Sonchus (Asteraceae) endemic to the Atlantic Madeira archipelago: Genome structure, comparative analysis, and phylogenetic relationships

The woody Sonchus alliance, a spectacular example of adaptive radiation with six genera and approximately 31 species, is found exclusively on three Macaronesian Islands (Madeira, Canaries, and Cape Verdes) in the Atlantic Ocean. Four of the Sonchus taxa are restricted to Madeira, including shrubs and small trees at higher elevations (S. fruticosus and S. pinnatus), and caudex perennials in the lower coastal areas (S. ustulatus subsp. maderensis and S. ustulatus subsp. ustulatus). The Madeiran Sonchus stemmed from a single colonization event that originated from the Canaries < 3 million years ago. However, the plastome evolution and species relationships remains insufficiently explored. We therefore assembled and characterized the plastomes of four Sonchus taxa from Madeira and conducted a phylogenomic analysis. We found highly conserved plastome sequences among the taxa, further supporting a single and recent origin. We also found highly conserved plastomes among the cosmopolitan weedy Sonchus, Macaronesian Sonchus in the Atlantic, and Juan Fernández Islands Dendroseris in the Pacific. Furthermore, we identified four mutation hotspot regions (trnK-rps16, petN-psbM, ndhF-Ψycf1, and ycf1) and simple sequence repeat motifs. This study strongly supports the monophyly of Madeiran Sonchus. However, its relationship with the remaining woody Sonchus alliance from the Canary Islands requires further investigation.


Introduction
Oceanic islands serve as natural laboratories that provide excellent opportunities to investigate the patterns and processes of organismal evolution [1]. The inherent geographical isolation from the continent allows ancestral island populations to become distinct from continental progenitor populations, providing an opportunity to undergo explosive lineage diversification into a wide range of available habitats within the oceanic insular environment-this phenomenon is commonly known as adaptive radiation [2], an example of cladogenetic speciation. ustulatus exclusively occur in Madeira. Sonchus ustulatus subsp. maderensis is more widespread, occurring on all three islands (Madeira, Porto Santo, and Desertas) [15]. Sonchus ustulatus subsp. ustulatus and S. ustulatus subsp. ustulatus are commonly distributed in the northern and southern coastal areas of Madeira, respectively. Sonchus fruticosus is distributed in Laurisilva and the moist ravines in the interior of Madeira, at altitudes of 800-1200 m, while S. pinnatus commonly is distributed on rocky slopes at 1000-1400 m and lower altitudes [15]. The advent of next generation sequencing (NGS) allowed for the rapid assembly and characterization of whole plastid genome sequences in numerous land plant lineages. These plastome resources have proven useful in resolving difficult and obscure phylogenetic relationships and developing efficient plastid markers for DNA barcoding and Occurrence data of S. fruticosus (purple) and S. pinnatus (orange) were obtained from the IUCN Red List database (https://www.iucnredlist.org/species/103588658/103588672 and https://www.iucnredlist.org/species/103588709/ 103588713, respectively). Green areas depict the species overlap areas. The two subspecies of S. ustulatus, namely subsp. maderensis (blue) and subsp. ustulatus (red) were based on previous reports [14]. The boundary, which is similar but not identical to the original image and is thus for illustrative purpose only, was obtained from the phylogeographic and phylogenetic studies [17]. In an ongoing effort to establish robust and highly resolved plastome-based phylogenetic relationships between Sonchus and related genera in the subtribe Hyoseridinae, we characterized several herbaceous and woody Sonchus species [18][19][20][21][22]. In particular, the plastomes of the woody Sonchus alliance have been characterized with special emphasis on species from the Canaries [18,21]. However, information on the plastome organization and variations among the four Madeiran taxa and their relationships with the species in the Canaries remains limited. Definitive evidence for the monophyly of Madeira Sonchus based on cpDNA is also lacking. Therefore, this study aimed to: (1) determine the complete plastomes of four Madeira-endemic Sonchus taxa; (2) compare their gene content, order, and any changes in organization; (3) conduct comparative genomic analyses to identify highly variable hotspot regions and chloroplast simple sequence repeat (cpSSR) markers; and (4) elucidate the phylogenetic relationships among the four Madeira Sonchus taxa and assess their relationships with the Canary Islands congeneric taxa.

Material preparation, DNA extraction, genome sequencing, and annotation
One the 2002 expedition to Madeira, four endemic Sonchus taxa were collected freshly from the island (see voucher and collection information in [14]). The total genomic DNA of three taxa, S. pinnatus, S. ustulatus subsp. ustulatus, and S. ustulatus subsp. maderensis was isolated using the Exgene™ Plant SV mini kit (GeneAll, Seoul, Korea). Sonchus fruticosus's DNA was extracted using the CTAB method [23]. The concentration and quality of the extracted DNA were verified using 1% agarose gel electrophoresis. Next-generation sequencing (NGS) was performed at Macrogen Corporation (Seoul, Korea), and approximately six gigabytes of raw NGS sequence data was generated for each taxon. As shown in Table 1, the depth of coverage of 500X or higher indicates that sufficient sequencing was conducted for chloroplast genome assembly.
An Illumina paired-end (PE) genomic library was constructed and sequenced using the Illumina HiSeq platform, according to the standard Illumina PE protocol in the TruSeq Nano DNA Sample Preparation Guide. The sequence reads were assembled using the de novo genomic assembler Velvet 1.2.10 [24]. In the case of S. pinnatus, we performed PCR confirmation for three ambiguous regions (two in the large single copy, LSC, and one in the small single copy, SSC) to assemble the circular plastid genome. PCR gap filling was performed using the Inclone™ Taq DNA polymerase kit (IncloneBiotech Co., Yongin, Korea), with a final volume of 50 μL. The PCR product was purified using the Inclone™ Gel & PCR Purification Kit (Inclonebiotech Co., Yongin, Korea) and sequenced at Macrogen Corporation (Seoul, Korea).

Identification of highly divergent regions
We estimated the nucleotide diversity of woody Sonchus species and analyzed the DNA polymorphisms using DnaSP v. 6 [30]. This software computed the sliding window for nucleotide diversity (Pi) with a 100 bp window length and a 25 bp step size. We executed the LAGAN alignment mode using the mVISTA program (http://genome.lbl.gov/vista/mvista/; [31,32]) to align the four Madeira Sonchus taxa. Sonchus fruticosus was used as the reference. Four plastid genomes were aligned using MAFFT v. 7 (https://mafft.cbrc.jp/alignment/software/; [33]), and gene orientation was checked using Geneious software. The window size and resolution were 100 bp and 60 bp, respectively.

Codon usage bias, RNA editing site, and genes under positive selection
The codon usage bias was calculated using MEGA-X [35]. We compared the codon distribution and relative synonymous codon usage (RSCU) values of the study group. Moreover, we conducted RNA-editing on Madeiran Sonchus species using PREP-Cp (http://prep.unl.edu/; [36]), with default settings (cutoff value of 0.8). Each Madeira Sonchus had 87 protein-coding sequences (CDSs). Thirty-five of the total CDSs were analyzed as 52 plastid genes were not supported by the PREP-Cp. We calculated the Ka/Ks ratio of each pair in Madeira Sonchus using DnaSP v. 6 [30] to evaluate selective pressure.

Phylogenetic analysis
To determine the phylogenetic position and relationships of four Madeiran Sonchus species, we selected a total of 18 complete chloroplast genomes of the subtribe Hyoseridinae (formerly known as Sonchinae). Reichardia ligulata (MN893255), which diverged early in Hyoseridinae, was used as the outgroup. We selected herbaceous Sonchus spp., including S. boulosii . We aligned the chloroplast genomes using MAFFT v. 7 with default settings and edited them manually using Geneious software. A maximum-likelihood (ML) tree and a Bayesian tree were inferred using IQ-TREE v. 1. 6.12 [37], and MrBayes v.3.2.7 [38], respectively, with partitioning the alignment by the gene annotations. The best-fit substitution model for each partition was chosen according to the Bayesian information criterion (BIC) score and weight, using ModelFinder [39] implemented in IQ-TREE. The branch bootstrap support (BS) of ML tree was calculated with 1000 bootstrap replicates, and for Bayesian inference, the Markov chain Monte Carlo (MCMC) was run for four chains for 2,000,000 generations with sampling a tree each 100 generations and burning out the first 25% of the sampled trees. We also conducted a maximum parsimony (MP) analysis using PAUP version 4.0a [40]. The default heuristic search options included starting trees via stepwise simple sequence addition with one tree held at each step, the tree-bisection-reconnection (TBR) branch-swapping algorithm, steepest descent, and MulTrees option in effect, zero branch length collapsed, and topological constraints not enforced. BS was calculated from 1000 replicates using the same heuristic search options to evaluate the robustness of the groups.

Comparative genome analysis of four Sonchus taxa according to gene content, order, and organization
Each Madeira chloroplast genome was composed of 131 CDSs, 87 genes, 37 tRNAs, and six rRNAs (  (Fig 2), which are assembled with high depths of coverage, ranged from 537 (S. pinnatus) to 2770 (S. fruticosus). The sizes of the LSC, SSC, and IR were nearly identical among the four plastomes (Table 2). In particular, two morphologically distinct shrubby species, S. fruticosus and S. pinnatus, presented identical chloroplast genome sizes with nearly identical sequences, except for three point-mutations on each clpP intron, and the trnK-rps16 and psaA-ycf3 intergenic spacers. The four Madeira taxa had identical gene compositions (Table 1). Seven genes (petB, atpF, ndhA, rpl2, rpl16, rps16, and rpoC1) contained a single intron and three genes (clpP, rps12, and ycf3) contained two introns ( Table 2). We found little variation for the border positions of the LSC, SSC, and IR regions among the chloroplast genomes of Sonchus species in Madeira and the Canary Islands (Fig 3). The Madeira Sonchus showed a difference in the ycf1 region between SSC and IRb in S. ustulatus subsp. maderensis, compared with the remaining taxa. For the Canary Islands Sonchus plastomes, S. webbii contained a 6 bp gap between ψycf1 and ndhF, while others contained a 15 bp gap at the SSC/ IRa. The ψycf1 gene of S. webbii (480 bp) was slightly longer than the others (471 bp).

Simple sequence repeats, codon usage bias, and RNA editing sites
We investigated SSRs among the four Madeira Sonchus plastomes and found a similar total number of SSRs (Fig 4). A total of 71 SSRs were found in S. fruticosus and S. ustulatus subsp. ustulatus, and 70 in S. pinnatus and S. ustulatus subsp. maderensis (Fig 4A). The most SSRs

PLOS ONE
were detected in the coding regions (53%), and the most frequent SSR type was trinucleotides (81%) (Fig 4A). The LSC region contained the most SSRs (46-47 SSRs; 58-59%), compared to the SSC (15 SSRs; 21%) and IR (9 SSRs; 12.8%) regions ( Fig 4B). All four plastomes had an equal number of mononucleotide, dinucleotide, and tetranucleotide repeats, but different numbers of trinucleotide and hexanucleotide repeats (Fig 4A). In terms of trinucleotide motifs, S. pinnatus contained 64 repeats, whereas the remaining taxa contained 65 repeats. Additionally, S. ustulatus subsp. maderensis did not contain a hexanucleotide repeat, whereas the remaining species had one hexanucleotide repeat. Interestingly, we found identical frequencies of SSR distribution and types between the two morphologically divergent S. fruticosus and S. ustulatus subsp. ustulatus. Analysis of relative synonymous codon usage (RSCU) based on the protein-coding genes revealed an average codon usage ranging from 22,774 (S. ustulatus subsp. maderensis) to 22,776 (S. fruticosus, S. pinnatus, and S. ustulatus subsp. ustulatus) among the four Madeira Sonchus taxa, with consistent patterns of frequently used codons among them (S1 Table). The highest RSCU value was observed with the UUA codon used for Leucine (1.9), followed by that of AGA for Arginine (1.41-1.86) and GCU for Alanine (1.78) (Fig 5). The lowest RSCU value was observed with the AGC codon used for serine (0.33), and CUG and CUC for leucine (0.37).
We also predicted RNA editing sites in the Madeira Sonchus species. All species had 48 RNA editing sites with the same cutoff value of 0.8 (S2 Table). These editing sites were present

PLOS ONE
in 20 of the 35 protein-coding genes. Nine RNA editing sites were found in ndhB, five in ndhD, four in accD and rpoC1, three in matK and ndhA, and two in atpA, ccsA, ndhG, petB, rpoC2, and rps14. One editing site was identified for each atpI, ndhF, psbF, psbL, rpl20, rpoA, rpoB, and rps2. All nucleotide changes were cytosine (C) to thymine (T) transitions. The most frequent transitions were conversions from proline (P) to leucine (L).

Sequence divergence and mutation hotspot regions
We calculated the nucleotide diversity of eight woody Sonchus alliances on the Macaronesian Islands (i.e., Madeira and the Canary Islands) using DnaSP (Fig 6A and 6B). The average nucleotide diversity (Pi) was 0.0005 and ranged from 0 to 0.015. The SSC region had the  highest nucleotide diversity (Pi = 0.00117), whereas each of the two IR regions had the lowest (P = 0.0001). The LSC region presented nucleotide diversity of 0.00058. For the eight Macaronesia Sonchus plastomes, we found four mutation hotspots with Pi values of � 0.01, including two intergenic regions (trnK-rps16 and petN-psbM) in the LSC, and one protein-coding (ycf1) and one intergenic region (ndhF-ψycf1) in the SSC (Fig 6B). We also found four highly variable regions (i.e., trnK-rps16 and psbI-trnS, clpP intron, and ycf1) when only Madeiran species plastomes were considered (Fig 6A). The mVISTA plots, with S. fruticosus as a reference, showed a high degree of synteny and gene order conservation among three of the Madeira Sonchus plastomes (Fig 7). We found 56 polymorphic sites. Not surprisingly, the intergenic spacer region (IGS) contained 55.4% of the total mutations, while the exon and intron regions contained 23.2% and 21.4%, respectively (Table 3). Approximately 55% of the polymorphisms were base substitutions, 29% were

Monophyly and species relationships in Madeira archipelago
The topologies of the ML tree and the Bayesian inference (BI) tree were identical, and both strongly suggested the monophyly of the woody Sonchus alliance in the Macaronesian Islands (100% BS and 1.0 posterior probability (PP)) (Fig 8). The MP tree (not shown) also strongly supported the monophyly of Sonchus and its relationships with other representative genera of the subtribe Hyoseridinae. It is highly likely that the woody Sonchus alliance from the Macaronesian Islands was derived from a common ancestor. Also, the Madeira Sonchus formed a monophyletic clade based on the complete plastome sequences (100% BS). The two subspecies of S. ustulatus, namely subsp. ustulatus and subsp. maderensis, are not in a monophyletic clade. Rather, subsp. ustulatus shares a more recent common ancestor with S. pinnatus and S. fruticosus than with conspecific subsp. maderensis (Fig 8).

Madeiran Sonchus chloroplast genomes
We found highly conserved plastomes based on their size, gene content, order, and organization between the four Madeira Sonchus. The Madeira Sonchus species (152,410 bp-152,426   Intron  TATTATTTATTAA  TATTATTTATTAA  -TATTATTTATTAA  2 rpl16 bp) had slightly larger plastomes than the weedy Sonchus (151,849 bp -151,967 bp) and the Canary Islands Sonchus species (152,071 bp-152,406 bp). However, the gene content and organization were highly conserved among the groups [20,21]. The woody Sonchus alliance from Madeira and the Canaries had slightly larger plastomes than the continental herbaceous congeneric species. In addition, very similar plastome sizes (152,199 bp-152,619 bp for Dendroseris and 152,071 bp-152,406 bp for the woody Sonchus alliance), gene content, and organization was observed between the Atlantic Ocean diploid Sonchus endemics and the Pacific  Ocean Juan Fernández Islands tetraploid Dendroseris, which belongs to the same subtribe Hyoseridinae as Sonchus [42]. This indicates that, even though these insular endemics from the Pacific and Atlantic Oceans are morphologically and ecologically divergent, they have highly conserved plastomes that suggest their recent independent origins. Using the same SSR search parameters (1-15, 2-5, 3-3, 4-3, 5-3, and 6-3) as the weedy Sonchus (S. asper, S. oleraceus; [20]) and the woody Sonchus alliance in the Canaries (S. canariensis, S. acaulis, S. webbii; [21]), slightly fewer SSRs were found among the Madeira Sonchus: 71 versus 79 for the weedy Sonchus and 80 for S. acaulis, 78 for S. canariensis, and 78 for S. webbii. Most of the trinucleotide motifs found in the Madeira Sonchus plastomes coincided with the weedy and Canary Islands Sonchus species [20,21]. We found a similar number of cpSSRs (71-74) and trinucleotide repeats with the highest frequency (87%) in the Juan Fernández Island endemic Dendroseris [42]. However, the LSC region contained the most SSRs (62%) in Dendroseris. The cpSSR markers identified in this study will be useful and important resources for population genetics and phylogeographic studies. The codon type distribution was consistent ( Fig 5). The codon usage bias toward a high RSCU value of U and A at the third codon position was also noted, as previously reported [43][44][45]. The number of RNA editing sites of Madeira Sonchus was significantly lower than that of Sonchus asper (98 sites; [20]) and mostly woody species of Dendroseris (93-104 sites; [42]). In general, the most frequent RNA editing sites were found in ndhB and ndhD. The Madeira Sonchus presented a consistent pattern also found in previous studies [46][47][48]. Two regions, trnK-rps16 and ycf1, were consistently identified as highly variable regions in the cosmopolitan weedy Sonchus [20] and Macaronesia endemic Sonchus [21]. As maternally inherited chloroplast markers, these mutation hotspots will be useful for population or phylogenetic studies of the woody Sonchus alliance in Macaronesia and widely distributed Sonchus and related species. Substantially fewer polymorphic sites were found in the Madeira Sonchus than in the three Canary Islands Sonchus species (206 polymorphic sites; [21]) and the herbaceous weedy Sonchus species (528 polymorphic sites; [20]). This further supports the previous view that, despite their morphological and ecological differentiation, the Madeira Sonchus species are highly genetically similar, suggesting their recent origin from a common ancestor most likely from the Canary Islands [12,14,49].

Phylogenetic relationships
The phylogenetic tree shows the monophyletic clade of the woody Sonchus alliance in the Macaronesian Islands. Although the current study was based on a limited number of species, it is most likely that the woody Sonchus alliance from the Macaronesian Islands was derived from a common ancestor, as previously reported [12,14,41]. Unlike previous studies [14], the monophyly of the Madeira Sonchus was strongly supported in every ML, BI and MP analysis based on the complete plastome sequences. The monophyly of the Madeira clade and interspecific relationships were weakly supported based on the nrDNA ITS (65% and 71%, respectively) and few chloroplast regions (69% and 60%, respectively). One novel finding of this study is that the two conspecific subspecies of S. ustulatus are not closely related. Rather, S. ustulatus subsp. ustulatus shares a more recent common ancestor with S. pinnatus and S. fruticosus than with conspecific S. ustulatus subsp. maderensis. The previous ITS tree weakly suggested that two subspecies from coastal areas, without forming monophyly, represent basal lineages that first diverged within Madeira [14].
The plastome-based phylogenetic analysis of Madeira Sonchus suggests that two currently recognized subspecies of S. ustulatus may represent distinct taxonomic entities, warranting recognition at the species level. Sonchus ustulatus subsp. maderensis was proposed by Aldridge [50], based on its morphological differences and geographical distribution. Sonchus ustulatus subsp. maderensis occurs in moist rocky areas on the northern coast of Madeira (and rarely in Porto Santo and Desertas). In contrast, S. ustulatus subsp. ustulatus occurs in dry rocky and sunny areas on the southern coast of Madeira. These taxa are small, herbaceous perennials with acaulous or short, subwoody stems that can reach up to 30 cm in height. In contrast, S. pinnatus and S. fruticose are perennial shrubs that reach heights of up to 2 m and 4 m, respectively. Therefore, given the species distribution and chloroplast phylogenomic tree topology, it is highly likely that the common ancestor of the Madeiran Sonchus was somewhat similar to S. ustulatus. These were likely herbaceous perennials that first colonized the northern coastal areas of Madeira and diversified into laurel forests and moist ravines in the interior of Madeira, becoming tall shrubs. The species-level recognition of the two subspecies is further supported by a recent report of a new species, S. parathalassius J.G.Costa ex R.Jardim & M. Seq., from Porto Santo and the taxonomic recognition of S. latifolia (= S. ustulauts subsp. maderensis) [51]. Overall, new species-level recognition and interspecific relationships in the Madeira archipelago require further independent confirmation based on highly variable genome-wide nuclear markers.
The monophyly of the Madeira Sonchus was fully and strongly established in this study, but that of the closely related or progenitor Sonchus species from the Canary Islands remains unclear. Based on the limited Canary Islands Sonchus samples in this study, the plastome tree suggested that S. acaulis and S. canariensis are sisters to the Madeira clade, suggesting a possible origin from Tenerife and Gran Canaria. We gained insufficient insight into this using previous cpDNA phylogeny because of a lack of resolution within subg. Dendrosonchus and genus Taeckholmia [14]. The more resolved nuclear ITS phylogeny suggested the Madeira Sonchus as a sister to the clade of some Canary Islands Sonchus from various islands, including the older central islands (Gran Canaria, Tenerife, and La Gomera) and relatively younger western islands (La Palma and El Hierro). It makes the identification of the ancestral island and species of origin of Madeira Sonchus uncertain. Nevertheless, it was suggested that the origin of Madeira Sonchus happened relatively recently (i.e., estimated inter-archipelago dispersal event of ca. 2.7 million years ago), despite the geological age (ca. 5 million years ago) of the Madeira archipelago [49]. An extensive phylogenomic framework, including all the woody Sonchus alliances in the Macaronesian Islands, is required to further determine the geographical origin of the Madeira Sonchus species.

Conclusion
This is the first study to report the complete chloroplast genome of four Madeira Sonchus species, enabling the comparison of their genomes to those of the congeneric cosmopolitan Sonchus, the woody Canary Islands endemic Sonchus from the Atlantic Ocean, and the Juan Fernández Islands endemic Dendroseris in the Pacific Ocean. Although the four Madeira endemic Sonchus species are highly morphologically and ecologically differentiated, we found nearly identical and highly conserved chloroplast genomes, supporting a single and recent origin. We identified four mutation hotspots, namely trnK-rps16, petN-psbM, ndhF-Cycf1, and ycf, and 70-71 cpSSRs for population genetics, phylogeography, and phylogenetic investigation. We also characterized the codon usage bias and RNA editing sites among the four Madeira taxa and compared them with those of the Canary Islands Sonchus and the Juan Fernández Islands Dendroseris. The plastome-based phylogenetic tree strongly supported the monophyly of the Madeira lineage for the first time, but its relationship with the Canary Islands congenerics requires further study based on genome-wide nuclear and chloroplast genomes. The phylogenomic tree also suggests that the herbaceous perennial, S. ustulatus subsp. maderensis, which occurs on the northern coast of Madeira, was the first to diverge within the Madeira archipelago. Lastly, S. ustulatus subsp. ustulatus, which are herbaceous perennials occurring on the southern coast of Madeira, are more closely related to the clade of perennial shrubs, S. pinnatus and S. fruticose, than to conspecific S. ustulatus subsp. maderensis. This relationship corroborates the species-level recognition of S. ustulatus subsp. maderensis as a distinct species, S. latifolia. Further genome-wide investigations may clarify the relationship among Sonchus species within Madeira.
Supporting information S1